29. Remove, Repeat
Remove, Repeat
Question:
This word seems like an outlier in a certain sense, so let’s remove it and refit. Go back to
text_learning/vectorize_text.py
, and remove this word from the emails using the same method you used to remove “sara”, “chris”, etc. Rerun
vectorize_text.py
, and once that finishes, rerun
find_signature.py
.
Any other outliers pop up? What word is it?
Seem like a signature-type word? (Define an outlier as a feature with importance >0.2, as before).
Start Quiz:

INSTRUCTOR NOTE:
Special Note: Depending on when you downloaded the code provided for
find_signature.py
, you may need to change the code in lines 9-10 to be
words_file = "../text_learning/your_word_data.pkl"
authors_file = "../text_learning/your_email_authors.pkl"
so that the files created from running
vectorize_text.py
are reflected properly.